Using Fat-trees to Maximize the Number of Processors in a Massively Parallel Computer

نویسندگان

M Valerio

L E Moser

P M Melliar-Smith

چکیده

We investigate the problem of maximizing the number of processors in a massively parallel computer when the degree of the internal nodes and the diameter of the network are physically constrained. The solution we propose for this problem is a fat-tree with processors at the leaves. The basic building block of this fat-tree is a two-level fat-tree. This two-level fat-tree is obtained >from complete sets of mutually orthogonal Latin Squares. As an application of this approach, we describe a novel interconnection network in which each internal node of the fat-tree is a ring. These rings are constructed using the integrated circuit QR0001 Data Stream Controller Interface, produced by National Semiconductor. Restricted to at most 16 interfaces per ring and to a network diameter of at most four, the resulting interconnection network has 51,984 processors. 1 Introduction Interconnection networks for massively parallel computers have been studied in depth during the past few years 1, 5, 10, 15]. Interconnection networks with a multi-ring topology have not, however, been fully investigated , although they have been found to be well-suited for high-performance parallel architectures 11], and multiple rings have been used to synthesize general topologies 9]. The design of a multi-ring interconnec-tion network must take into account physical limitations on the number of interfaces that can be accommodated on an individual ring and on the diameter of the network. We distinguish between two types of interfaces, those that provide access to a processor and those that act as a switch. The function of a switch is to allow transfer of data from one ring to another. Given the physical limitations of the particular communication medium, the question we ask is: How can multiple rings be connected in a multi-ring interconnection network so as to maximize the number of processors? The issue of how the numbers of switches and rings increase with the number of processors is also examined. We address the above question for interconnection networks, in general, and for a network based on the integrated circuit QR0001 from National Semiconductor, in particular. QR0001 is a high-speed communication switch with a data transmission rate of 180 MBytes/sec. It has been designed to accommodate a maximum of 16 interfaces on a ring with a maximum network diameter of four. The performance of the network depends heavily on the speed of the switches. QR0001 has high speed at low cost and is an option to …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speeding up the Stress Analysis of Hollow Circular FGM Cylinders by Parallel Finite Element Method

In this article, a parallel computer program is implemented, based on Finite Element Method, to speed up the analysis of hollow circular cylinders, made from Functionally Graded Materials (FGMs). FGMs are inhomogeneous materials, which their composition gradually varies over volume. In parallel processing, an algorithm is first divided to independent tasks, which may use individual or shared da...

متن کامل

A High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure

The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...

متن کامل

A High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure

متن کامل

Mixed Large-Eddy Simulation Model for Turbulent Flows across Tube Bundles Using Parallel Coupled Multiblock NS Solver

In this study, turbulent flow around a tube bundle in non-orthogonal grid is simulated using the Large Eddy Simulation (LES) technique and parallelization of fully coupled Navier – Stokes (NS) equations. To model the small eddies, the Smagorinsky and a mixed model was used. This model represents the effect of dissipation and the grid-scale and subgrid-scale interactions. The fully coupled NS eq...

متن کامل

Mixed Large-Eddy Simulation Model for Turbulent Flows across Tube Bundles Using Parallel Coupled Multiblock NS Solver

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1993

Using Fat-trees to Maximize the Number of Processors in a Massively Parallel Computer

نویسندگان

چکیده

منابع مشابه

Speeding up the Stress Analysis of Hollow Circular FGM Cylinders by Parallel Finite Element Method

A High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure

A High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure

Mixed Large-Eddy Simulation Model for Turbulent Flows across Tube Bundles Using Parallel Coupled Multiblock NS Solver

Mixed Large-Eddy Simulation Model for Turbulent Flows across Tube Bundles Using Parallel Coupled Multiblock NS Solver

عنوان ژورنال:

اشتراک گذاری